Now that we understand the logic of spatial relationship queries, let’s take a look at another fundamental spatial operation that relies on them.
This operation, called a spatial join, is the process by which we can leverage the spatial relationships between distinct datasets to merge their information into a new, synthetic dataset.
This operation can be thought as the spatial equivalent of an attribute join, in which multiple tabular datasets can be merged by aligning matching values in a common column that they both contain. Thus, we’ll start by developing an understanding of this operation first!
Instructor Notes
library(sf)
library(tmap)
Let’s read in a table of data from the US Census’ 5-year American Community Survey (ACS5).
# Read in the ACS5 data for CA into an `sf` object.
# Note: We force the FIPS_11_digit to be read in as a string to preserve any leading zeroes.
acs5_df = read.csv("notebook_data/census/ACS5yr/census_variables_CA.csv")
head(acs5_df)
## NAME c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4012, Alameda County, California 2456 1287 476 259 283
## 2 Census Tract 4013, Alameda County, California 3983 845 1348 827 796
## 3 Census Tract 4014, Alameda County, California 4340 713 1902 593 981
## 4 Census Tract 4015, Alameda County, California 2080 563 1064 215 190
## 5 Census Tract 4016, Alameda County, California 1889 324 960 247 274
## 6 Census Tract 4017, Alameda County, California 2544 553 634 302 945
## c_race_moe c_white_moe c_black_moe c_asian_moe c_latinx_moe state_fips county_fips
## 1 213 191 116 124 195 6 1
## 2 680 186 411 283 236 6 1
## 3 644 314 440 198 620 6 1
## 4 369 222 283 116 105 6 1
## 5 400 135 376 164 149 6 1
## 6 314 140 259 144 247 6 1
## tract_fips med_rent med_hhinc c_tenants c_owners c_renters med_rent_moe med_hhinc_moe
## 1 401200 1251 62064 1245 544 701 128 14598
## 2 401300 927 35030 1743 213 1530 64 6917
## 3 401400 884 28264 1447 274 1173 64 5802
## 4 401500 718 38684 992 449 543 80 12077
## 5 401600 1088 32663 707 140 567 79 16290
## 6 401700 1147 63052 1083 398 685 214 19070
## c_tenants_moe c_owners_moe c_renters_moe c_movers c_stay c_movelocal c_movecounty
## 1 60 94 91 2448 1995 253 143
## 2 87 69 104 3978 2434 1114 252
## 3 106 75 125 4269 3448 699 76
## 4 82 96 114 2080 1750 211 112
## 5 88 61 90 1860 1545 148 153
## 6 95 85 116 2535 2001 350 116
## c_movestate c_moveabroad c_movers_moe c_stay_moe c_movelocal_moe c_movecounty_moe
## 1 25 32 213 188 187 66
## 2 90 88 680 324 512 115
## 3 27 19 605 618 246 52
## 4 7 0 369 353 112 103
## 5 4 10 393 368 89 149
## 6 34 34 312 282 193 62
## c_movestate_moe c_moveabroad_moe c_commute c_car c_carpool c_transit c_bike c_walk
## 1 39 31 1460 805 94 276 122 85
## 2 75 98 2046 698 223 801 37 214
## 3 34 26 1595 751 34 408 186 163
## 4 11 12 982 493 89 226 47 17
## 5 6 16 603 344 74 107 38 0
## 6 47 39 1585 648 284 373 21 29
## c_commute_moe c_car_moe c_carpool_moe c_transit_moe c_bike_moe c_walk_moe year
## 1 191 144 109 105 60 43 2013
## 2 468 183 133 264 36 121 2013
## 3 396 211 28 179 108 151 2013
## 4 222 145 67 82 34 15 2013
## 5 170 128 74 120 44 12 2013
## 6 225 168 124 141 22 44 2013
## FIPS_11_digit p_white p_black p_asian p_latinx p_owners p_renters p_stay
## 1 6001401200 0.5240228 0.1938111 0.1054560 0.11522801 0.4369478 0.5630522 0.8149510
## 2 6001401300 0.2121516 0.3384384 0.2076324 0.19984936 0.1222031 0.8777969 0.6118653
## 3 6001401400 0.1642857 0.4382488 0.1366359 0.22603687 0.1893573 0.8106427 0.8076833
## 4 6001401500 0.2706731 0.5115385 0.1033654 0.09134615 0.4526210 0.5473790 0.8413462
## 5 6001401600 0.1715193 0.5082054 0.1307570 0.14505029 0.1980198 0.8019802 0.8306452
## 6 6001401700 0.2173742 0.2492138 0.1187107 0.37146226 0.3674977 0.6325023 0.7893491
## p_movelocal p_movecounty p_movestate p_moveabroad p_car p_carpool p_transit p_bike
## 1 0.10334967 0.05841503 0.010212418 0.013071895 0.5513699 0.06438356 0.1890411 0.08356164
## 2 0.28004022 0.06334842 0.022624434 0.022121669 0.3411535 0.10899316 0.3914956 0.01808407
## 3 0.16373858 0.01780276 0.006324666 0.004450691 0.4708464 0.02131661 0.2557994 0.11661442
## 4 0.10144231 0.05384615 0.003365385 0.000000000 0.5020367 0.09063136 0.2301426 0.04786151
## 5 0.07956989 0.08225806 0.002150538 0.005376344 0.5704809 0.12271973 0.1774461 0.06301824
## 6 0.13806706 0.04575937 0.013412229 0.013412229 0.4088328 0.17917981 0.2353312 0.01324921
## p_walk
## 1 0.05821918
## 2 0.10459433
## 3 0.10219436
## 4 0.01731161
## 5 0.00000000
## 6 0.01829653
Brief summary of the data:
Below is a table of the variables in this table. They were combined from different ACS 5 year tables.
NOTE: - variables that start with c_ are counts - variables that start with med_ are medians - variables that end in _moe are margin of error estimates - variables that start with _p are proportions calcuated from the counts divided by the table denominator (the total count for whom that variable was assessed)
| Variable | Description |
|---|---|
c_race |
Total population |
c_white |
Total white non-Latinx |
c_black |
Total black and African American non-Latinx |
c_asian |
Total Asian non-Latinx |
c_latinx |
Total Latinx |
state_fips |
State level FIPS code |
county_fips |
County level FIPS code |
tract_fips |
Tracts level FIPS code |
med_rent |
Median rent |
med_hhinc |
Median household income |
c_tenants |
Total tenants |
c_owners |
Total owners |
c_renters |
Total renters |
c_movers |
Total number of people who moved |
c_stay |
Total number of people who stayed |
c_movelocal |
Number of people who moved locally |
c_movecounty |
Number of people who moved counties |
c_movestate |
Number of people who moved states |
c_moveabroad |
Number of people who moved abroad |
c_commute |
Total number of commuters |
c_car |
Number of commuters who use a car |
c_carpool |
Number of commuters who carpool |
c_transit |
Number of commuters who use public transit |
c_bike |
Number of commuters who bike |
c_walk |
Number of commuters who bike |
year |
ACS data year |
FIPS_11_digit |
11-digit FIPS code |
We’re going to drop all of our moe columns by identifying all of those that end with _moe. We can do that in two steps, first by using filter to identify columns that contain the string _moe.
tidyverse will help with this!
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.3.2 ✔ purrr 0.3.4
## ✔ tibble 3.0.3 ✔ dplyr 1.0.2
## ✔ tidyr 1.1.1 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
acs5_df = acs5_df %>% select(-contains("_moe"))
Unfortunately, when this dataset reads in, the 11-digit FIPS codes that should be strings actually read in as numerics, and thus the leading 0 gets truncated. We’re going to need those FIPS code in the correct format later, so let’s reformat them now.
# recast the FIPS 11-digit codes as strings, pasting a 0 at the front of each
acs5_df$FIPS_11_digit = paste0('0', acs5_df$FIPS_11_digit)
And lastly, let’s grab only the rows for year 2018 and county FIPS code 1 (i.e. Alameda County)
acs5_df_ac = acs5_df[acs5_df$year==2018 & acs5_df$county_fips==1, ]
head(acs5_df_ac)
## NAME c_race c_white c_black c_asian c_latinx
## 8324 Census Tract 4415.01, Alameda County, California 6570 677 111 4740 570
## 8325 Census Tract 4047, Alameda County, California 2079 1515 134 199 175
## 8326 Census Tract 4425, Alameda County, California 7748 1430 375 3379 1904
## 8327 Census Tract 4503, Alameda County, California 5301 2597 96 1077 1315
## 8328 Census Tract 4506.07, Alameda County, California 5971 2832 324 1726 804
## 8329 Census Tract 4049, Alameda County, California 3973 2383 394 442 337
## state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters
## 8324 6 1 441501 1883 152394 1796 1634 162
## 8325 6 1 404700 3250 181000 790 680 110
## 8326 6 1 442500 1999 95323 2264 1245 1019
## 8327 6 1 450300 2626 121131 1814 1249 565
## 8328 6 1 450607 1837 96061 2445 883 1562
## 8329 6 1 404900 1446 111846 1874 1095 779
## c_movers c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car
## 8324 6491 6010 257 68 129 27 2317 1765
## 8325 2043 1822 58 77 65 21 1075 572
## 8326 7628 6858 557 29 23 161 3330 2382
## 8327 5249 4668 134 326 114 7 2819 2112
## 8328 5905 4735 715 254 201 0 3315 2316
## 8329 3939 3647 123 100 34 35 2211 1030
## c_carpool c_transit c_bike c_walk year FIPS_11_digit p_white p_black p_asian
## 8324 264 127 28 8 2018 06001441501 0.1030441 0.01689498 0.7214612
## 8325 191 170 7 6 2018 06001404700 0.7287157 0.06445406 0.0957191
## 8326 375 214 59 34 2018 06001442500 0.1845638 0.04839959 0.4361125
## 8327 183 265 28 33 2018 06001450300 0.4899076 0.01810979 0.2031692
## 8328 440 193 71 114 2018 06001450607 0.4742924 0.05426227 0.2890638
## 8329 303 522 48 0 2018 06001404900 0.5997986 0.09916939 0.1112509
## p_latinx p_owners p_renters p_stay p_movelocal p_movecounty p_movestate
## 8324 0.08675799 0.9097996 0.09020045 0.9258974 0.03959328 0.010476044 0.019873671
## 8325 0.08417508 0.8607595 0.13924051 0.8918257 0.02838962 0.037689672 0.031815957
## 8326 0.24574084 0.5499117 0.45008834 0.8990561 0.07302045 0.003801783 0.003015207
## 8327 0.24806640 0.6885336 0.31146637 0.8893122 0.02552867 0.062107068 0.021718423
## 8328 0.13465081 0.3611452 0.63885481 0.8018628 0.12108383 0.043014395 0.034038950
## 8329 0.08482255 0.5843116 0.41568837 0.9258695 0.03122620 0.025387154 0.008631632
## p_moveabroad p_car p_carpool p_transit p_bike p_walk
## 8324 0.004159606 0.7617609 0.11394044 0.05481226 0.012084592 0.003452741
## 8325 0.010279001 0.5320930 0.17767442 0.15813953 0.006511628 0.005581395
## 8326 0.021106450 0.7153153 0.11261261 0.06426426 0.017717718 0.010210210
## 8327 0.001333587 0.7492018 0.06491664 0.09400497 0.009932600 0.011706279
## 8328 0.000000000 0.6986425 0.13273002 0.05822021 0.021417798 0.034389140
## 8329 0.008885504 0.4658526 0.13704206 0.23609227 0.021709634 0.000000000
| Now let’s also read in our census tracts again! |
r tracts_sf = st_read("./notebook_data/census/Tracts/cb_2013_06_tract_500k.shp", ) |
## Reading layer `cb_2013_06_tract_500k' from data source `/home/drew/Desktop/stuff/berk/dlab/Geospatial-Fundamentals-in-R-with-sf/rewrite/notebook_data/census/Tracts/cb_2013_06_tract_500k.shp' using driver `ESRI Shapefile' ## Simple feature collection with 8043 features and 9 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00952 ## geographic CRS: NAD83 |
r head(tracts_sf) |
## Simple feature collection with 6 features and 9 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -122.3049 ymin: 37.76456 xmax: -122.2074 ymax: 37.84851 ## geographic CRS: NAD83 ## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ALAND AWATER ## 1 06 001 400300 1400000US06001400300 06001400300 4003 CT 1105329 0 ## 2 06 001 400900 1400000US06001400900 06001400900 4009 CT 420877 0 ## 3 06 001 402200 1400000US06001402200 06001402200 4022 CT 712082 0 ## 4 06 001 402800 1400000US06001402800 06001402800 4028 CT 398311 0 ## 5 06 001 404800 1400000US06001404800 06001404800 4048 CT 628405 0 ## 6 06 001 406100 1400000US06001406100 06001406100 4061 CT 1843685 74875 ## geometry ## 1 MULTIPOLYGON (((-122.2642 3... ## 2 MULTIPOLYGON (((-122.2856 3... ## 3 MULTIPOLYGON (((-122.304 37... ## 4 MULTIPOLYGON (((-122.276 37... ## 5 MULTIPOLYGON (((-122.2182 3... ## 6 MULTIPOLYGON (((-122.2387 3... |
r tracts_sf_ac = tracts_sf[tracts_sf$COUNTYFP == '001',] plot(tracts_sf_ac$geometry) |
| # 7.1 Attribute Joins |
Attribute Joins between sf data.frames and plain data.frames |
| We just mapped the census tracts. But what makes a map powerful is when you map the data associated with the locations. |
- tracts_sf_ac: These are polygon data in an sf data.frame. However, as we saw in the head of that dataset, they no attributes of interest! |
- acs5_df_ac: These are 2018 ACS data from a CSV file (‘census_variables_CA.csv’), imported and read in as a plain data.frame. However, they have no geometries! |
In order to map the ACS data we need to associate it with the tracts. Let’s do that now, by joining the columns from acs5_df_ac to the columns of tracts_gdf_ac using a common column as the key for matching rows. This process is called an attribute join. |
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left >
The image above gives us a nice conceptual summary of the types of joins we could run.
(NOTE: You can read more about merging sf and plain data.frames here.)
Okay, here we go!
Let’s take a look at the common column in both our data.frames.
head(tracts_sf_ac['GEOID'])
## Simple feature collection with 6 features and 1 field
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -122.3049 ymin: 37.76456 xmax: -122.2074 ymax: 37.84851
## geographic CRS: NAD83
## GEOID geometry
## 1 06001400300 MULTIPOLYGON (((-122.2642 3...
## 2 06001400900 MULTIPOLYGON (((-122.2856 3...
## 3 06001402200 MULTIPOLYGON (((-122.304 37...
## 4 06001402800 MULTIPOLYGON (((-122.276 37...
## 5 06001404800 MULTIPOLYGON (((-122.2182 3...
## 6 06001406100 MULTIPOLYGON (((-122.2387 3...
head(acs5_df_ac['FIPS_11_digit'])
## FIPS_11_digit
## 8324 06001441501
## 8325 06001404700
## 8326 06001442500
## 8327 06001450300
## 8328 06001450607
## 8329 06001404900
Note that they are not named the same thing.
That's okay! We just need to know that they contain the same information.
Also note that they are not in the same order.
That's not only okay... That's the point! (If they were in the same order already then we could just join them side by side, without having R find and line up the matching rows from each!)
Let’s do a left join to keep all of the census tracts in Alameda County and only the ACS data for those tracts.
NOTE: To figure out how to do this we could always take a peek at the documentation by calling ?base::merge.
?base::merge
# Left join keeps all tracts and the acs data for those tracts
tracts_acs_sf_ac = base::merge(tracts_sf_ac, acs5_df_ac, by.x = 'GEOID', by.y = "FIPS_11_digit", all.x=TRUE)
head(tracts_acs_sf_ac)
## Simple feature collection with 6 features and 52 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
## GEOID STATEFP COUNTYFP TRACTCE AFFGEOID NAME.x LSAD ALAND AWATER
## 1 06001400100 06 001 400100 1400000US06001400100 4001 CT 6894334 0
## 2 06001400200 06 001 400200 1400000US06001400200 4002 CT 587453 0
## 3 06001400300 06 001 400300 1400000US06001400300 4003 CT 1105329 0
## 4 06001400400 06 001 400400 1400000US06001400400 4004 CT 714729 0
## 5 06001400500 06 001 400500 1400000US06001400500 4005 CT 590307 0
## 6 06001400600 06 001 400600 1400000US06001400600 4006 CT 297856 0
## NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4001, Alameda County, California 3115 2055 128 592 104
## 2 Census Tract 4002, Alameda County, California 2025 1436 59 183 178
## 3 Census Tract 4003, Alameda County, California 5000 3458 380 535 364
## 4 Census Tract 4004, Alameda County, California 3843 2551 229 373 420
## 5 Census Tract 4005, Alameda County, California 3786 1885 990 264 334
## 6 Census Tract 4006, Alameda County, California 1638 817 343 144 124
## state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1 6 1 400100 3501 200893 1297 1158 139 3084
## 2 6 1 400200 1902 160536 855 532 323 2012
## 3 6 1 400300 1481 94732 2441 1057 1384 4948
## 4 6 1 400400 1624 113036 1750 653 1097 3754
## 5 6 1 400500 1627 103846 1622 605 1017 3745
## 6 6 1 400600 1640 127232 657 283 374 1587
## c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1 2514 202 120 219 29 1542 864 165
## 2 1827 40 46 39 60 1211 500 40
## 3 4159 223 289 156 121 2975 1252 177
## 4 3440 113 133 46 22 2293 933 240
## 5 3065 309 180 145 46 2325 983 156
## 6 1221 152 146 68 0 1022 370 120
## c_transit c_bike c_walk year p_white p_black p_asian p_latinx p_owners p_renters
## 1 271 0 10 2018 0.6597111 0.04109149 0.19004815 0.03338684 0.8928296 0.1071704
## 2 426 62 57 2018 0.7091358 0.02913580 0.09037037 0.08790123 0.6222222 0.3777778
## 3 835 202 171 2018 0.6916000 0.07600000 0.10700000 0.07280000 0.4330193 0.5669807
## 4 588 188 92 2018 0.6638043 0.05958886 0.09705959 0.10928962 0.3731429 0.6268571
## 5 619 177 94 2018 0.4978870 0.26148970 0.06973059 0.08821976 0.3729963 0.6270037
## 6 268 93 3 2018 0.4987790 0.20940171 0.08791209 0.07570208 0.4307458 0.5692542
## p_stay p_movelocal p_movecounty p_movestate p_moveabroad p_car p_carpool p_transit
## 1 0.8151751 0.06549935 0.03891051 0.07101167 0.009403372 0.5603113 0.10700389 0.1757458
## 2 0.9080517 0.01988072 0.02286282 0.01938370 0.029821074 0.4128819 0.03303055 0.3517754
## 3 0.8405416 0.04506871 0.05840744 0.03152789 0.024454325 0.4208403 0.05949580 0.2806723
## 4 0.9163559 0.03010123 0.03542888 0.01225360 0.005860416 0.4068905 0.10466638 0.2564326
## 5 0.8184246 0.08251001 0.04806409 0.03871829 0.012283044 0.4227957 0.06709677 0.2662366
## 6 0.7693762 0.09577820 0.09199748 0.04284814 0.000000000 0.3620352 0.11741683 0.2622309
## p_bike p_walk geometry
## 1 0.00000000 0.006485084 MULTIPOLYGON (((-122.2469 3...
## 2 0.05119736 0.047068538 MULTIPOLYGON (((-122.2574 3...
## 3 0.06789916 0.057478992 MULTIPOLYGON (((-122.2642 3...
## 4 0.08198866 0.040122111 MULTIPOLYGON (((-122.2618 3...
## 5 0.07612903 0.040430108 MULTIPOLYGON (((-122.2694 3...
## 6 0.09099804 0.002935421 MULTIPOLYGON (((-122.2681 3...
Let’s check that we have all the variables we have in our dataset now.
colnames(tracts_acs_sf_ac)
## [1] "GEOID" "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" "NAME.x"
## [7] "LSAD" "ALAND" "AWATER" "NAME.y" "c_race" "c_white"
## [13] "c_black" "c_asian" "c_latinx" "state_fips" "county_fips" "tract_fips"
## [19] "med_rent" "med_hhinc" "c_tenants" "c_owners" "c_renters" "c_movers"
## [25] "c_stay" "c_movelocal" "c_movecounty" "c_movestate" "c_moveabroad" "c_commute"
## [31] "c_car" "c_carpool" "c_transit" "c_bike" "c_walk" "year"
## [37] "p_white" "p_black" "p_asian" "p_latinx" "p_owners" "p_renters"
## [43] "p_stay" "p_movelocal" "p_movecounty" "p_movestate" "p_moveabroad" "p_car"
## [49] "p_carpool" "p_transit" "p_bike" "p_walk" "geometry"
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left >
It’s always important to run sanity checks on our results, at each step of the way!
In this case, how many rows and columns should we have?
print("Rows and columns in the Alameda County Census tract gdf:")
## [1] "Rows and columns in the Alameda County Census tract gdf:"
print(dim(tracts_sf_ac))
## [1] 361 10
print("Row and columns in the ACS5 2018 data:")
## [1] "Row and columns in the ACS5 2018 data:"
print(dim(acs5_df_ac))
## [1] 361 44
print("Rows and columns in the Alameda County Census tract gdf joined to the ACS data:")
## [1] "Rows and columns in the Alameda County Census tract gdf joined to the ACS data:"
print(dim(tracts_acs_sf_ac))
## [1] 361 53
Let’s save out our merged data so we can use it in the final notebook.
st_write(tracts_acs_sf_ac, './outdata/tracts_acs_gdf_ac.json', driver='GeoJSON', delete_dsn=T)
## Deleting source `./outdata/tracts_acs_gdf_ac.json' using driver `GeoJSON'
## Writing layer `tracts_acs_gdf_ac' to data source `./outdata/tracts_acs_gdf_ac.json' using driver `GeoJSON'
## Writing 361 features with 52 fields and geometry type Multi Polygon.
We can now make choropleth maps using our attribute joined geodataframe. Go ahead and pick one variable to color the map, then map it using tmap (since it’s too easy using the plot method). You can go back to lesson 5 if you need a refresher on how to make this!
To see the solution, double-click the Markdown cell below.
head(tracts_acs_sf_ac)
## Simple feature collection with 6 features and 52 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
## GEOID STATEFP COUNTYFP TRACTCE AFFGEOID NAME.x LSAD ALAND AWATER
## 1 06001400100 06 001 400100 1400000US06001400100 4001 CT 6894334 0
## 2 06001400200 06 001 400200 1400000US06001400200 4002 CT 587453 0
## 3 06001400300 06 001 400300 1400000US06001400300 4003 CT 1105329 0
## 4 06001400400 06 001 400400 1400000US06001400400 4004 CT 714729 0
## 5 06001400500 06 001 400500 1400000US06001400500 4005 CT 590307 0
## 6 06001400600 06 001 400600 1400000US06001400600 4006 CT 297856 0
## NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4001, Alameda County, California 3115 2055 128 592 104
## 2 Census Tract 4002, Alameda County, California 2025 1436 59 183 178
## 3 Census Tract 4003, Alameda County, California 5000 3458 380 535 364
## 4 Census Tract 4004, Alameda County, California 3843 2551 229 373 420
## 5 Census Tract 4005, Alameda County, California 3786 1885 990 264 334
## 6 Census Tract 4006, Alameda County, California 1638 817 343 144 124
## state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1 6 1 400100 3501 200893 1297 1158 139 3084
## 2 6 1 400200 1902 160536 855 532 323 2012
## 3 6 1 400300 1481 94732 2441 1057 1384 4948
## 4 6 1 400400 1624 113036 1750 653 1097 3754
## 5 6 1 400500 1627 103846 1622 605 1017 3745
## 6 6 1 400600 1640 127232 657 283 374 1587
## c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1 2514 202 120 219 29 1542 864 165
## 2 1827 40 46 39 60 1211 500 40
## 3 4159 223 289 156 121 2975 1252 177
## 4 3440 113 133 46 22 2293 933 240
## 5 3065 309 180 145 46 2325 983 156
## 6 1221 152 146 68 0 1022 370 120
## c_transit c_bike c_walk year p_white p_black p_asian p_latinx p_owners p_renters
## 1 271 0 10 2018 0.6597111 0.04109149 0.19004815 0.03338684 0.8928296 0.1071704
## 2 426 62 57 2018 0.7091358 0.02913580 0.09037037 0.08790123 0.6222222 0.3777778
## 3 835 202 171 2018 0.6916000 0.07600000 0.10700000 0.07280000 0.4330193 0.5669807
## 4 588 188 92 2018 0.6638043 0.05958886 0.09705959 0.10928962 0.3731429 0.6268571
## 5 619 177 94 2018 0.4978870 0.26148970 0.06973059 0.08821976 0.3729963 0.6270037
## 6 268 93 3 2018 0.4987790 0.20940171 0.08791209 0.07570208 0.4307458 0.5692542
## p_stay p_movelocal p_movecounty p_movestate p_moveabroad p_car p_carpool p_transit
## 1 0.8151751 0.06549935 0.03891051 0.07101167 0.009403372 0.5603113 0.10700389 0.1757458
## 2 0.9080517 0.01988072 0.02286282 0.01938370 0.029821074 0.4128819 0.03303055 0.3517754
## 3 0.8405416 0.04506871 0.05840744 0.03152789 0.024454325 0.4208403 0.05949580 0.2806723
## 4 0.9163559 0.03010123 0.03542888 0.01225360 0.005860416 0.4068905 0.10466638 0.2564326
## 5 0.8184246 0.08251001 0.04806409 0.03871829 0.012283044 0.4227957 0.06709677 0.2662366
## 6 0.7693762 0.09577820 0.09199748 0.04284814 0.000000000 0.3620352 0.11741683 0.2622309
## p_bike p_walk geometry
## 1 0.00000000 0.006485084 MULTIPOLYGON (((-122.2469 3...
## 2 0.05119736 0.047068538 MULTIPOLYGON (((-122.2574 3...
## 3 0.06789916 0.057478992 MULTIPOLYGON (((-122.2642 3...
## 4 0.08198866 0.040122111 MULTIPOLYGON (((-122.2618 3...
## 5 0.07612903 0.040430108 MULTIPOLYGON (((-122.2694 3...
## 6 0.09099804 0.002935421 MULTIPOLYGON (((-122.2681 3...
# YOUR CODE HERE
| # 7.2 Spatial Joins |
| Great! We’ve wrapped our heads around the concept of an attribute join. |
| Now let’s extend that concept to its spatially explicit equivalent: the spatial join! |
| To start, we’ll read in some other data: The Alameda County schools data. |
Then we’ll work with that data and our tracts_acs_sf_ac data together. |
r schools_df = read.csv('notebook_data/alco_schools.csv') schools_sf = st_as_sf(schools_df, coords = c('X', 'Y'), crs=4326) |
Let’s check if we have to transform the schools to match thetracts_acs_sf_ac’s CRS. |
r print('schools_sf CRS:') |
## [1] "schools_sf CRS:" |
r print(st_crs(schools_sf)) |
## Coordinate Reference System: ## User input: EPSG:4326 ## wkt: ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["geodetic latitude (Lat)",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["geodetic longitude (Lon)",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## USAGE[ ## SCOPE["unknown"], ## AREA["World"], ## BBOX[-90,-180,90,180]], ## ID["EPSG",4326]] |
r print('tracts_acs_sf_ac CRS:') |
## [1] "tracts_acs_sf_ac CRS:" |
r print(st_crs(tracts_acs_sf_ac)) |
## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]] |
| Yes we do! Let’s do that. |
| NOTE: Explicit syntax aiming at that dataset’s CRS leaves less room for human error! |
| ```r schools_sf = st_transform(schools_sf, st_crs(tracts_acs_sf_ac)) |
| print(‘schools_sf CRS:’) ``` |
## [1] "schools_sf CRS:" |
r print(st_crs(schools_sf)) |
## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]] |
r print('tracts_acs_sf_ac CRS:') |
## [1] "tracts_acs_sf_ac CRS:" |
r print(st_crs(tracts_acs_sf_ac)) |
## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]] |
| Now we’re ready to combine the datasets in an analysis. |
| In this case, we want to get data from the census tract within which each school is located. |
| But how can we do that? The two datasets don’t share a common column to use for a join. |
r colnames(tracts_acs_sf_ac) |
## [1] "GEOID" "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" "NAME.x" ## [7] "LSAD" "ALAND" "AWATER" "NAME.y" "c_race" "c_white" ## [13] "c_black" "c_asian" "c_latinx" "state_fips" "county_fips" "tract_fips" ## [19] "med_rent" "med_hhinc" "c_tenants" "c_owners" "c_renters" "c_movers" ## [25] "c_stay" "c_movelocal" "c_movecounty" "c_movestate" "c_moveabroad" "c_commute" ## [31] "c_car" "c_carpool" "c_transit" "c_bike" "c_walk" "year" ## [37] "p_white" "p_black" "p_asian" "p_latinx" "p_owners" "p_renters" ## [43] "p_stay" "p_movelocal" "p_movecounty" "p_movestate" "p_moveabroad" "p_car" ## [49] "p_carpool" "p_transit" "p_bike" "p_walk" "geometry" |
r colnames(schools_sf) |
## [1] "Site" "Address" "City" "State" "Type" "API" "Org" "geometry" |
| However, they do have a shared relationship by way of space! |
| So, we’ll use a spatial relationship query to figure out the census tract that each school is in, then associate the tract’s data with that school (as additional data in the school’s row). This is a spatial join! |
In this case, let’s say we’re interested in the relationship between the median household income in a census tract (tracts_acs_sf_ac$med_hhinc) and a school’s Academic Performance Index (schools_gdf$API).
To start, let’s take a look at the distributions of our two variables of interest.
head(tracts_acs_sf_ac)
## Simple feature collection with 6 features and 52 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
## GEOID STATEFP COUNTYFP TRACTCE AFFGEOID NAME.x LSAD ALAND AWATER
## 1 06001400100 06 001 400100 1400000US06001400100 4001 CT 6894334 0
## 2 06001400200 06 001 400200 1400000US06001400200 4002 CT 587453 0
## 3 06001400300 06 001 400300 1400000US06001400300 4003 CT 1105329 0
## 4 06001400400 06 001 400400 1400000US06001400400 4004 CT 714729 0
## 5 06001400500 06 001 400500 1400000US06001400500 4005 CT 590307 0
## 6 06001400600 06 001 400600 1400000US06001400600 4006 CT 297856 0
## NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4001, Alameda County, California 3115 2055 128 592 104
## 2 Census Tract 4002, Alameda County, California 2025 1436 59 183 178
## 3 Census Tract 4003, Alameda County, California 5000 3458 380 535 364
## 4 Census Tract 4004, Alameda County, California 3843 2551 229 373 420
## 5 Census Tract 4005, Alameda County, California 3786 1885 990 264 334
## 6 Census Tract 4006, Alameda County, California 1638 817 343 144 124
## state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1 6 1 400100 3501 200893 1297 1158 139 3084
## 2 6 1 400200 1902 160536 855 532 323 2012
## 3 6 1 400300 1481 94732 2441 1057 1384 4948
## 4 6 1 400400 1624 113036 1750 653 1097 3754
## 5 6 1 400500 1627 103846 1622 605 1017 3745
## 6 6 1 400600 1640 127232 657 283 374 1587
## c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1 2514 202 120 219 29 1542 864 165
## 2 1827 40 46 39 60 1211 500 40
## 3 4159 223 289 156 121 2975 1252 177
## 4 3440 113 133 46 22 2293 933 240
## 5 3065 309 180 145 46 2325 983 156
## 6 1221 152 146 68 0 1022 370 120
## c_transit c_bike c_walk year p_white p_black p_asian p_latinx p_owners p_renters
## 1 271 0 10 2018 0.6597111 0.04109149 0.19004815 0.03338684 0.8928296 0.1071704
## 2 426 62 57 2018 0.7091358 0.02913580 0.09037037 0.08790123 0.6222222 0.3777778
## 3 835 202 171 2018 0.6916000 0.07600000 0.10700000 0.07280000 0.4330193 0.5669807
## 4 588 188 92 2018 0.6638043 0.05958886 0.09705959 0.10928962 0.3731429 0.6268571
## 5 619 177 94 2018 0.4978870 0.26148970 0.06973059 0.08821976 0.3729963 0.6270037
## 6 268 93 3 2018 0.4987790 0.20940171 0.08791209 0.07570208 0.4307458 0.5692542
## p_stay p_movelocal p_movecounty p_movestate p_moveabroad p_car p_carpool p_transit
## 1 0.8151751 0.06549935 0.03891051 0.07101167 0.009403372 0.5603113 0.10700389 0.1757458
## 2 0.9080517 0.01988072 0.02286282 0.01938370 0.029821074 0.4128819 0.03303055 0.3517754
## 3 0.8405416 0.04506871 0.05840744 0.03152789 0.024454325 0.4208403 0.05949580 0.2806723
## 4 0.9163559 0.03010123 0.03542888 0.01225360 0.005860416 0.4068905 0.10466638 0.2564326
## 5 0.8184246 0.08251001 0.04806409 0.03871829 0.012283044 0.4227957 0.06709677 0.2662366
## 6 0.7693762 0.09577820 0.09199748 0.04284814 0.000000000 0.3620352 0.11741683 0.2622309
## p_bike p_walk geometry
## 1 0.00000000 0.006485084 MULTIPOLYGON (((-122.2469 3...
## 2 0.05119736 0.047068538 MULTIPOLYGON (((-122.2574 3...
## 3 0.06789916 0.057478992 MULTIPOLYGON (((-122.2642 3...
## 4 0.08198866 0.040122111 MULTIPOLYGON (((-122.2618 3...
## 5 0.07612903 0.040430108 MULTIPOLYGON (((-122.2694 3...
## 6 0.09099804 0.002935421 MULTIPOLYGON (((-122.2681 3...
hist(tracts_acs_sf_ac$med_hhinc)
hist(schools_sf$API)
Oh, right! Those pesky schools with no reported APIs (i.e. API == 0)! Let’s drop those.
schools_sf_api = schools_sf[schools_sf$API > 0, ]
hist(schools_sf_api$API)
Much better!
Now, maybe we think there ought to be some correlation between the two variables? As a first pass at this possibility, let’s overlay the two datasets, coloring each one by its variable of interest. This should give us a sense of whether or not similar values co-occur.
tm_shape(tracts_acs_sf_ac) +
tm_polygons(col = 'med_hhinc',
palette = 'RdPu') +
tm_shape(schools_sf_api) +
tm_dots(col = 'API',
palette = 'RdPu',
size = 0.15)
Though it’s hard to say for sure, it certainly looks possible. It would be ideal to scatterplot the variables! But in order to do that, we need to know the median household income in each school’s tract, which means we definitely need our spatial join!
We’ll first take a look at the documentation for the spatial join function, st_join.
?st_join
Looks like the key arguments to consider are: - the two sf data.frames (x and y) - the type of join to run (left), which is a left join if TRUE, or an inner join if FALSE - the spatial relationship query to use (join)
NOTE: - By default st_join is a left join, because left defaults to TRUE.
st_join maintains the geometries of the first sf data.frame input to the operation (i.e. the geometries of x).<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left >
sf data.frame are we joining onto which (i.e. which one is getting the other one’s data added to it)?sf data.frame should be x, which should be y, and should left be TRUE or FALSE?Alright! Let’s run our join!
schools_jointracts = st_join(schools_sf_api, tracts_acs_sf_ac, left=T, join=st_within)
## although coordinates are longitude/latitude, st_within assumes that they are planar
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left >
As always, we want to sanity-check our intermediate result before we rush ahead.
One way to do that is to introspect the structure of the result object a bit.
print(dim(schools_jointracts))
## [1] 325 60
print(dim(schools_sf))
## [1] 550 8
print(dim(tracts_acs_sf_ac))
## [1] 361 53
head(schools_jointracts)
## Simple feature collection with 6 features and 59 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -122.2616 ymin: 37.739 xmax: -122.2348 ymax: 37.76911
## geographic CRS: NAD83
## Site Address City State Type API Org GEOID
## 1 Amelia Earhart Elementary 400 Packet Landing Rd Alameda CA ES 933 Public 06001428302
## 2 Bay Farm Elementary 200 Aughinbaugh Way Alameda CA ES 932 Public 06001428302
## 3 Donald D. Lum Elementary 1801 Sandcreek Way Alameda CA ES 853 Public 06001428500
## 4 Edison Elementary 2700 Buena Vista Ave Alameda CA ES 927 Public 06001427100
## 5 Frank Otis Elementary 3010 Fillmore St Alameda CA ES 894 Public 06001428200
## 6 Franklin Elementary 1433 San Antonio Ave Alameda CA ES 893 Public 06001427900
## STATEFP COUNTYFP TRACTCE AFFGEOID NAME.x LSAD ALAND AWATER
## 1 06 001 428302 1400000US06001428302 4283.02 CT 2234427 474770
## 2 06 001 428302 1400000US06001428302 4283.02 CT 2234427 474770
## 3 06 001 428500 1400000US06001428500 4285 CT 624197 383522
## 4 06 001 427100 1400000US06001427100 4271 CT 1025446 168187
## 5 06 001 428200 1400000US06001428200 4282 CT 1366564 800242
## 6 06 001 427900 1400000US06001427900 4279 CT 825214 0
## NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4283.02, Alameda County, California 7501 3114 273 3070 562
## 2 Census Tract 4283.02, Alameda County, California 7501 3114 273 3070 562
## 3 Census Tract 4285, Alameda County, California 3152 1293 268 940 378
## 4 Census Tract 4271, Alameda County, California 3768 2446 67 574 457
## 5 Census Tract 4282, Alameda County, California 6643 3468 387 1589 686
## 6 Census Tract 4279, Alameda County, California 5031 2554 228 1407 563
## state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1 6 1 428302 1946 147667 2540 2015 525 7436
## 2 6 1 428302 1946 147667 2540 2015 525 7436
## 3 6 1 428500 1873 88333 1422 485 937 3125
## 4 6 1 427100 1767 135833 1420 1040 380 3724
## 5 6 1 428200 1855 103781 2641 1468 1173 6587
## 6 6 1 427900 1712 102077 2030 734 1296 4971
## c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1 6705 395 99 175 62 3812 2595 296
## 2 6705 395 99 175 62 3812 2595 296
## 3 2641 282 102 100 0 1514 910 65
## 4 3498 121 86 0 19 1755 986 136
## 5 6155 205 71 141 15 3391 2189 229
## 6 4470 108 244 130 19 3102 1705 380
## c_transit c_bike c_walk year p_white p_black p_asian p_latinx p_owners p_renters
## 1 409 18 73 2018 0.4151446 0.03639515 0.4092788 0.07492334 0.7933071 0.2066929
## 2 409 18 73 2018 0.4151446 0.03639515 0.4092788 0.07492334 0.7933071 0.2066929
## 3 374 50 18 2018 0.4102157 0.08502538 0.2982234 0.11992386 0.3410689 0.6589311
## 4 303 33 64 2018 0.6491507 0.01778132 0.1523355 0.12128450 0.7323944 0.2676056
## 5 510 51 108 2018 0.5220533 0.05825681 0.2391992 0.10326660 0.5558501 0.4441499
## 6 624 41 52 2018 0.5076526 0.04531902 0.2796661 0.11190618 0.3615764 0.6384236
## p_stay p_movelocal p_movecounty p_movestate p_moveabroad p_car p_carpool p_transit
## 1 0.9016945 0.05311996 0.01331361 0.02353416 0.008337816 0.6807450 0.07764953 0.1072928
## 2 0.9016945 0.05311996 0.01331361 0.02353416 0.008337816 0.6807450 0.07764953 0.1072928
## 3 0.8451200 0.09024000 0.03264000 0.03200000 0.000000000 0.6010568 0.04293263 0.2470277
## 4 0.9393126 0.03249194 0.02309345 0.00000000 0.005102041 0.5618234 0.07749288 0.1726496
## 5 0.9344163 0.03112191 0.01077881 0.02140580 0.002277213 0.6455323 0.06753170 0.1503981
## 6 0.8992154 0.02172601 0.04908469 0.02615168 0.003822169 0.5496454 0.12250161 0.2011605
## p_bike p_walk geometry
## 1 0.004721931 0.01915005 POINT (-122.2388 37.74476)
## 2 0.004721931 0.01915005 POINT (-122.2519 37.739)
## 3 0.033025099 0.01188904 POINT (-122.2589 37.76206)
## 4 0.018803419 0.03646724 POINT (-122.2348 37.76525)
## 5 0.015039811 0.03184901 POINT (-122.2381 37.75396)
## 6 0.013217279 0.01676338 POINT (-122.2616 37.76911)
Confirmed! The output of the our st_join operation is an sf data.frame (schools_jointracts) with: - a row for each school that is located inside a census tract (all of them are) - the point geometry of that school - all of the attribute data columns (non-geometry columns) from both input sf data.frames
Let’s also take a look at an overlay map of the schools on the tracts. If we color the schools categorically by their tracts IDs, then we should see that all schools within a given tract polygon are the same color.
tm_shape(tracts_acs_sf_ac) +
tm_polygons(col='white', border.col='black') +
tm_shape(schools_jointracts) +
tm_dots(col='GEOID', size=0.2)
## Warning: Number of levels of the variable "GEOID" is 208, which is larger than max.categories
## (which is 30), so levels are combined. Set tmap_options(max.categories = 208) in the layer
## function to show all levels.
Fantastic! That looks right!
Now we can create that scatterplot we were thinking about!
plot(schools_jointracts$med_hhinc, schools_jointracts$API,
xlab = 'median household income ($)',
ylab = 'API')
Wow! Just as we suspected based on our overlay map, there’s a pretty obvious, strong, and positive correlation between median household income in a school’s tract and the school’s API.
We just saw that a spatial join in one way to leverage the spatial relationship between two datasets in order to create a new, synthetic dataset.
An aggregation is another way we can generate new data from this relationship. In this case, for each feature in one dataset we find all the features in another dataset that satisfy our chosen spatial relationship query with it (e.g. within, intersects), then aggregate them using some summary function (e.g. count, mean).
Let’s take this for a spin with our data. We’ll count all the schools within each census tract.
We could do this using an aspatial group-by operation on the GEOID column of the new, spatially joined dataset that we just made. However, since we’re in a geospatial workshop let’s use a spatial aggregation instead!
(Also, to get the correct count, lets use all our schools, not just those with APIs > 0.)
schools_for_count = schools_sf['geometry']
schools_for_count$count = 1
schools_countsbytract = sf:::aggregate.sf(x=schools_for_count, by=tracts_acs_sf_ac, FUN=sum)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
Let’s see what we got out.
print("Counts, rows and columns:")
## [1] "Counts, rows and columns:"
print(dim(schools_countsbytract))
## [1] 361 2
print("Tracts, rows and columns:")
## [1] "Tracts, rows and columns:"
print(dim(tracts_acs_sf_ac))
## [1] 361 53
# take a look at the data
head(schools_countsbytract)
## Simple feature collection with 6 features and 1 field
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
## count geometry
## 1 1 MULTIPOLYGON (((-122.2469 3...
## 2 1 MULTIPOLYGON (((-122.2574 3...
## 3 NA MULTIPOLYGON (((-122.2642 3...
## 4 2 MULTIPOLYGON (((-122.2618 3...
## 5 1 MULTIPOLYGON (((-122.2694 3...
## 6 NA MULTIPOLYGON (((-122.2681 3...
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left >
As a sanity-check, we can now map the school counts for all census tracts.
tm_shape(schools_countsbytract) +
tm_polygons(col='count') +
tm_shape(schools_sf) +
tm_dots(col='black', alpha=0.75, size=0.1)
As we mentioned, the spatial aggregation workflow that we just put together above could have been used not to generate a new count variable, but also to generate any other new variable the results from calling an aggregation function on an attribute column.
In this case, we want to calculate and map the mean API of the schools in each census tract.
Copy and paste code from above where useful, then tweak and/or add to that code such that your new code: 1. joins the schools onto the tracts (HINT: make sure to decide whether or not you want to include schools with API = 0!) 1. dissolves that joined object by the tract IDs, giving you a new GeoDataFrame with each tract’s mean API (HINT: because this is now a different calculation, different problems may arise and need handling!) 1. plots the tracts, colored by API scores (HINT: overlay the schools points again, visualizing them in a way that will help you visually check your results!)
To see the solution, double-click the Markdown cell below.
# YOUR CODE HERE:
We discussed how we can combine datasets to enhance any geospatial data analyses you could do. Key concepts include: - Attribute joins - merge() - Spatial joins (order matters!) - st_join - Aggregation - aggregate.sf
<div style="font-size:larger"> D-Lab @ University of California - Berkeley</div>
<div> Team Geo<div>